191 research outputs found

    Detailed estimation of bioinformatics prediction reliability through the Fragmented Prediction Performance Plots

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important and yet rather neglected question related to bioinformatics predictions is the estimation of the amount of data that is needed to allow reliable predictions. Bioinformatics predictions are usually validated through a series of figures of merit, like for example sensitivity and precision, and little attention is paid to the fact that their performance may depend on the amount of data used to make the predictions themselves.</p> <p>Results</p> <p>Here I describe a tool, named Fragmented Prediction Performance Plot (FPPP), which monitors the relationship between the prediction reliability and the amount of information underling the prediction themselves. Three examples of FPPPs are presented to illustrate their principal features. In one example, the reliability becomes independent, over a certain threshold, of the amount of data used to predict protein features and the intrinsic reliability of the predictor can be estimated. In the other two cases, on the contrary, the reliability strongly depends on the amount of data used to make the predictions and, thus, the intrinsic reliability of the two predictors cannot be determined. Only in the first example it is thus possible to fully quantify the prediction performance.</p> <p>Conclusion</p> <p>It is thus highly advisable to use FPPPs to determine the performance of any new bioinformatics prediction protocol, in order to fully quantify its prediction power and to allow comparisons between two or more predictors based on different types of data.</p

    Systematic review of antiepileptic drugs’ safety and effectiveness in feline epilepsy

    Get PDF
    Understanding the efficacy and safety profile of antiepileptic drugs (AEDs) in feline epilepsy is a crucial consideration for managing this important brain disease. However, there is a lack of information about the treatment of feline epilepsy and therefore a systematic review was constructed to assess current evidence for the AEDs’ efficacy and tolerability in cats. The methods and materials of our former systematic reviews in canine epilepsy were mostly mirrored for the current systematic review in cats. Databases of PubMed, CAB Direct and Google scholar were searched to detect peer-reviewed studies reporting efficacy and/or adverse effects of AEDs in cats. The studies were assessed with regards to their quality of evidence, i.e. study design, study population, diagnostic criteria and overall risk of bias and the outcome measures reported, i.e. prevalence and 95% confidence interval of the successful and affected population in each study and in total

    Positively Selected Codons in Immune-Exposed Loops of the Vaccine Candidate OMP-P1 of Haemophilus influenzae

    Get PDF
    The high levels of variation in surface epitopes can be considered as an evolutionary hallmark of immune selection. New computational tools enable analysis of this variation by identifying codons that exhibit high rates of amino acid changes relative to the synonymous substitution rate. In the outer membrane protein P1 of Haemophilus influenzae, a vaccine candidate for nontypeable strains, we identified four codons with this attribute in domains that did not correspond to known or assumed B- and T-cell epitopes of OMP-P1. These codons flank hypervariable domains and do not appear to be false positives as judged from parsimony and maximum likelihood analyses. Some closely spaced positively selected codons have been previously considered part of a transmembrane domain, which would render this region unsuited for inclusion in a vaccine. Secondary structure analysis, three-dimensional structural database searches, and homology modeling using FadL of E. coli as a structural homologue, however, revealed that all positively selected codons are located in or near extracellular looping domains. The spacing and level of diversity of these positively selected and exposed codons in OMP-P1 suggest that vaccine targets based on these and conserved flanking residues may provide broad coverage in H. influenzae

    PSP_MCSVM: brainstorming consensus prediction of protein secondary structures using two-stage multiclass support vector machines

    Get PDF
    Secondary structure prediction is a crucial task for understanding the variety of protein structures and performed biological functions. Prediction of secondary structures for new proteins using their amino acid sequences is of fundamental importance in bioinformatics. We propose a novel technique to predict protein secondary structures based on position-specific scoring matrices (PSSMs) and physico-chemical properties of amino acids. It is a two stage approach involving multiclass support vector machines (SVMs) as classifiers for three different structural conformations, viz., helix, sheet and coil. In the first stage, PSSMs obtained from PSI-BLAST and five specially selected physicochemical properties of amino acids are fed into SVMs as features for sequence-to-structure prediction. Confidence values for forming helix, sheet and coil that are obtained from the first stage SVM are then used in the second stage SVM for performing structure-to-structure prediction. The two-stage cascaded classifiers (PSP_MCSVM) are trained with proteins from RS126 dataset. The classifiers are finally tested on target proteins of critical assessment of protein structure prediction experiment-9 (CASP9). PSP_MCSVM with brainstorming consensus procedure performs better than the prediction servers like Predator, DSC, SIMPA96, for randomly selected proteins from CASP9 targets. The overall performance is found to be comparable with the current state-of-the art. PSP_MCSVM source code, train-test datasets and supplementary files are available freely in public domain at: http://sysbio.icm.edu.pl/secstruct and http://code.google.com/p/cmater-bioinfo

    Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information

    Get PDF
    Background : Structural properties of proteins such as secondary structure and solvent accessibility contribute to three-dimensional structure prediction, not only in the ab initio case but also when homology information to known structures is available. Structural properties are also routinely used in protein analysis even when homology is available, largely because homology modelling is lower throughput than, say, secondary structure prediction. Nonetheless, predictors of secondary structure and solvent accessibility are virtually always ab initio. Results: Here we develop high-throughput machine learning systems for the prediction of protein secondary structure and solvent accessibility that exploit homology to proteins of known structure, where available, in the form of simple structural frequency profiles extracted from sets of PDB templates. We compare these systems to their state-of-the-art ab initio counterparts, and with a number of baselines in which secondary structures and solvent accessibilities are extracted directly from the templates. We show that structural information from templates greatly improves secondary structure and solvent accessibility prediction quality, and that, on average, the systems significantly enrich the information contained in the templates. For sequence similarity exceeding 30%, secondary structure prediction quality is approximately 90%, close to its theoretical maximum, and 2-class solvent accessibility roughly 85%. Gains are robust with respect to template selection noise, and significant for marginal sequence similarity and for short alignments, supporting the claim that these improved predictions may prove beneficial beyond the case in which clear homology is available. Conclusion: The predictive system are publicly available at the address http://distill.ucd.ieScience Foundation IrelandIrish Research Council for Science, Engineering and TechnologyHealth Research BoardUCD President's Award 2004au, da, ke, ab, sp - kpw30/11/1

    Prediction of backbone dihedral angles and protein secondary structure using support vector machines

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The prediction of the secondary structure of a protein is a critical step in the prediction of its tertiary structure and, potentially, its function. Moreover, the backbone dihedral angles, highly correlated with secondary structures, provide crucial information about the local three-dimensional structure.</p> <p>Results</p> <p>We predict independently both the secondary structure and the backbone dihedral angles and combine the results in a loop to enhance each prediction reciprocally. Support vector machines, a state-of-the-art supervised classification technique, achieve secondary structure predictive accuracy of 80% on a non-redundant set of 513 proteins, significantly higher than other methods on the same dataset. The dihedral angle space is divided into a number of regions using two unsupervised clustering techniques in order to predict the region in which a new residue belongs. The performance of our method is comparable to, and in some cases more accurate than, other multi-class dihedral prediction methods.</p> <p>Conclusions</p> <p>We have created an accurate predictor of backbone dihedral angles and secondary structure. Our method, called DISSPred, is available online at <url>http://comp.chem.nottingham.ac.uk/disspred/</url>.</p

    Properties and identification of antibiotic drug targets

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We analysed 48 non-redundant antibiotic target proteins from all bacteria, 22 antibiotic target proteins from <it>E. coli </it>only and 4243 non-drug targets from <it>E. coli </it>to identify differences in their properties and to predict new potential drug targets.</p> <p>Results</p> <p>When compared to non-targets, bacterial antibiotic targets tend to be long, have high β-sheet and low α-helix contents, are polar, are found in the cytoplasm rather than in membranes, and are usually enzymes, with ligases particularly favoured. Sequence features were used to build a support vector machine model for <it>E. coli </it>proteins, allowing the assignment of any sequence to the drug target or non-target classes, with an accuracy in the training set of 94%. We identified 319 proteins (7%) in the non-target set that have target-like properties, many of which have unknown function. 63 of these proteins have significant and undesirable similarity to a human protein, leaving 256 target like proteins that are not present in humans.</p> <p>Conclusions</p> <p>We suggest that antibiotic discovery programs would be more likely to succeed if new targets are chosen from this set of target like proteins or their homologues. In particular, 64 are essential genes where the cell is not able to recover from a random insertion disruption.</p

    PCI-SS: MISO dynamic nonlinear protein secondary structure prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the function of a protein is largely dictated by its three dimensional configuration, determining a protein's structure is of fundamental importance to biology. Here we report on a novel approach to determining the one dimensional secondary structure of proteins (distinguishing α-helices, β-strands, and non-regular structures) from primary sequence data which makes use of Parallel Cascade Identification (PCI), a powerful technique from the field of nonlinear system identification.</p> <p>Results</p> <p>Using PSI-BLAST divergent evolutionary profiles as input data, dynamic nonlinear systems are built through a black-box approach to model the process of protein folding. Genetic algorithms (GAs) are applied in order to optimize the architectural parameters of the PCI models. The three-state prediction problem is broken down into a combination of three binary sub-problems and protein structure classifiers are built using 2 layers of PCI classifiers. Careful construction of the optimization, training, and test datasets ensures that no homology exists between any training and testing data. A detailed comparison between PCI and 9 contemporary methods is provided over a set of 125 new protein chains guaranteed to be dissimilar to all training data. Unlike other secondary structure prediction methods, here a web service is developed to provide both human- and machine-readable interfaces to PCI-based protein secondary structure prediction. This server, called PCI-SS, is available at <url>http://bioinf.sce.carleton.ca/PCISS</url>. In addition to a dynamic PHP-generated web interface for humans, a Simple Object Access Protocol (SOAP) interface is added to permit invocation of the PCI-SS service remotely. This machine-readable interface facilitates incorporation of PCI-SS into multi-faceted systems biology analysis pipelines requiring protein secondary structure information, and greatly simplifies high-throughput analyses. XML is used to represent the input protein sequence data and also to encode the resulting structure prediction in a machine-readable format. To our knowledge, this represents the only publicly available SOAP-interface for a protein secondary structure prediction service with published WSDL interface definition.</p> <p>Conclusion</p> <p>Relative to the 9 contemporary methods included in the comparison cascaded PCI classifiers perform well, however PCI finds greatest application as a consensus classifier. When PCI is used to combine a sequence-to-structure PCI-based classifier with the current leading ANN-based method, PSIPRED, the overall error rate (Q3) is maintained while the rate of occurrence of a particularly detrimental error is reduced by up to 25%. This improvement in BAD score, combined with the machine-readable SOAP web service interface makes PCI-SS particularly useful for inclusion in a tertiary structure prediction pipeline.</p

    Regional differentiation of felid vertebral column evolution: a study of 3D shape trajectories

    Get PDF
    Recent advances in geometric morphometrics provide improved techniques for extraction of biological information from shape and have greatly contributed to the study of ecomorphology and morphological evolution. However, the vertebral column remains an under-studied structure due in part to a concentration on skull and limb research, but most importantly because of the difficulties in analysing the shape of a structure composed of multiple articulating discrete units (i.e. vertebrae). Here, we have applied a variety of geometric morphometric analyses to three-dimensional landmarks collected on 19 presacral vertebrae to investigate the influence of potential ecological and functional drivers, such as size, locomotion and prey size specialisation, on regional morphology of the vertebral column in the mammalian family Felidae. In particular, we have here provided a novel application of a method—phenotypic trajectory analysis (PTA)—that allows for shape analysis of a contiguous sequence of vertebrae as functionally linked osteological structures. Our results showed that ecological factors influence the shape of the vertebral column heterogeneously and that distinct vertebral sections may be under different selection pressures. While anterior presacral vertebrae may either have evolved under stronger phylogenetic constraints or are ecologically conservative, posterior presacral vertebrae, specifically in the post-T10 region, show significant differentiation among ecomorphs. Additionally, our PTA results demonstrated that functional vertebral regions differ among felid ecomorphs mainly in the relative covariation of vertebral shape variables (i.e. direction of trajectories, rather than in trajectory size) and, therefore, that ecological divergence among felid species is reflected by morphological changes in vertebral column shape

    Classification and evolutionary history of the single-strand annealing proteins, RecT, Redβ, ERF and RAD52

    Get PDF
    BACKGROUND: The DNA single-strand annealing proteins (SSAPs), such as RecT, Redβ, ERF and Rad52, function in RecA-dependent and RecA-independent DNA recombination pathways. Recently, they have been shown to form similar helical quaternary superstructures. However, despite the functional similarities between these diverse SSAPs, their actual evolutionary affinities are poorly understood. RESULTS: Using sensitive computational sequence analysis, we show that the RecT and Redβ proteins, along with several other bacterial proteins, form a distinct superfamily. The ERF and Rad52 families show no direct evolutionary relationship to these proteins and define novel superfamilies of their own. We identify several previously unknown members of each of these superfamilies and also report, for the first time, bacterial and viral homologs of Rad52. Additionally, we predict the presence of aberrant HhH modules in RAD52 that are likely to be involved in DNA-binding. Using the contextual information obtained from the analysis of gene neighborhoods, we provide evidence of the interaction of the bacterial members of each of these SSAP superfamilies with a similar set of DNA repair/recombination protein. These include different nucleases or Holliday junction resolvases, the ABC ATPase SbcC and the single-strand-binding protein. We also present evidence of independent assembly of some of the predicted operons encoding SSAPs and in situ displacement of functionally similar genes. CONCLUSIONS: There are three evolutionarily distinct superfamilies of SSAPs, namely the RecT/Redβ, ERF, and RAD52, that have different sequence conservation patterns and predicted folds. All these SSAPs appear to be primarily of bacteriophage origin and have been acquired by numerous phylogenetically distant cellular genomes. They generally occur in predicted operons encoding one or more of a set of conserved DNA recombination proteins that appear to be the principal functional partners of the SSAPs
    corecore